Pagerduty free after rerunning completed normally

Hi,
We are using Pagerduty notification to alert Tidal job failures. Once the Tidal job failed, right now the job will rerun multiple times and probably completed successfully, and we want to mark the alerts/notifications as resolved on Pagerduty to avoid manually re-check Tidal jobs.

The steps might be as below:
Scan pager duty incidents, for each incident:

  1. get tidal job name
  2. check if the job already succeeded for the indident date
  3. resolve the incident if the job succeeded

I’m wondering how should we enable this functionality? Or what’s the solution for this?
Can you please give me some instructions?

Hi Xinwei,

It sounds like you should use our REST API to help you with this.

  1. You can use the GET /incidents endpoint to scan your incidents and get the Tidal job name
  2. Checking if the job already succeeded will be completing in Tidal, correct?
  3. If step #2 is correct, then you can resolve incidents with the PUT /incidents/{id} endpoint if you’d like to.

You can read about creating a REST API key on this guide.

I hope this helps, but please let me know if you have further questions.

Thanks,

Jay

Hi Jay,

Thanks for your instructions, this helps a lot.

I created an API Key for myself, but having another question.
I’m trying step 1, running command:
curl -X GET --header ‘Accept: application/vnd.pagerduty+json;version=2’ --header ‘Authorization: Token token=’ ‘https://api.pagerduty.com/notifications?time_zone=UTC&since=01%3A00&until=23%3A00&include[]=users’

However I got 0 notifications, even if we are having incidents right now.
The result is:
{“notifications”:[],“limit”:25,“offset”:0,“more”:false,“total”:null}

I’m using enterprise version by the way.

I’m confused, Can you help me with this?

Thanks!

Hi Xinwei,

Glad I could help!

At an initial glance, it looks like the since and until values in the request URL aren’t types that we’d expect. We expect ISO 8601 as outlined on our type guide here.

If you can confirm the since and until values are in ISO 8601 format, could you please write into support@pagerduty.com with the last 4 characters of the REST API key you used, along with the curl command and a rough timestamp for when the event was sent.

Thanks,

Jay

Hi Jay,

This helps!

If I’m only looking for incidents from one team, how can I specify with team name?

Thanks

And I’m only looking for un-resolved incidents.
Thanks

Hi Xinwei,

Unfortunately the /notification endpoint doesn’t have a filter for teams or un-resolved incidents.

You’d have to add logic in your code to filter for team and open incidents from the API response.

I hope this helps.

Cheers,

Jay

Hi Jay,

Ok I got it.
Since the format of notification looks as below:
{
“id”: “PWL7QXS”,
“type”: “phone_notification”,
“started_at”: “2013-03-06T15:28:51-05:00”,
“address”: “+15555551234”,
“user”: {
“id”: “PT23IWX”,
“type”: “user”,
“summary”: “Tim Wright”,
“self”: “…”,
“html_url”: “…”
}

How can I get details for this incident? On Terminal, I did a curl on the html_url, and turns out redirecting to sign_in page:

curl <html_url>

<html><body>You are being <a href="https://fmsops.pagerduty.com/sign_in">redirected</a>.</body></html>

How can I solve this?
Thanks!

Hi Hope,

You won’t be able to send a curl request to the URL in the web app.

Instead, I recommend referencing our “Get an Incident” endpoint here: https://api-reference.pagerduty.com/#!/Incidents/get_incidents_id

The curl command will look like this:

curl -X GET --header 'Accept: application/vnd.pagerduty+json;version=2' --header 'Authorization: Token token=<YOUR API TOKEN HERE>' 'https://api.pagerduty.com/incidents/<INCIDENT ID HERE>'

I hope this helps - let us know if you have any other questions!

Hi @Xinwei_Wang,

What if you just used the Events API instead?

You could use a “dedup_key” value of e.g. “tidal_jobname_timestamp”, e.g.

{
  "routing_key": "xxxx",
  "event_action": "trigger",
  "dedup_key": "tidal_jobcalledbob_2018-07-27",
  "payload": {
    "summary": "Tidal Job: Bob 2018-07-27 Failed",
    "severity": "critical",
    "source": "Tidal"
  }
}

and then when you find the success, send the resolve:

{
  "routing_key": "xxxx",
  "event_action": "resolve",
  "dedup_key": "tidal_jobcalledbob_2018-07-27",
}

There’s a rate limit on the Events API but it’s probably much easier than polling for an existing incident and trying to resolve it (because you can just post on every success).

Cheers,
Simon

1 Like